Search CORE

4 research outputs found

Selecting and Generating Computational Meaning Representations for Short Texts

Author: Finegan-Dollak Catherine
Publication venue
Publication date: 01/01/2018
Field of study

Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd

Deep Blue Documents at the University of Michigan

Sentence simplification, compression, and disaggregation for summarization of sophisticated documents

Author: Finegan‐dollak Catherine
Radev Dragomir R.
Publication venue: 'Wiley'
Publication date: 01/10/2016
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134176/1/asi23576.pd

Deep Blue Documents at the University of Michigan

GVdoc: Graph-based Visual Document Classification

Author: Finegan-Dollak Catherine
Mohbat Fnu
Verma Ashish
Zaki Mohammed J.
Publication venue
Publication date: 26/05/2023
Field of study

The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set

arXiv.org e-Print Archive